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LISTING OF THE CLAIMS 
(including amendments, if any) 

1. (currently amended) A method implemented in a computer system, for clustering a string, 
the string including a plurality of characters, the method including: 

identifying R unique n-grams Ti... R in the string; 
for every unique n-gram T s : 

if a frequency of T s in a set of n-gram statistics is not greater than a first threshold: 
clustering the string with a cluster associated with T s ; 

otherwise: 

for every other n-gram T v in the string Ti... R> except s: 

if concluding that the frequency of n-gram Tv is greater than the first 
threshold , and in response : 
if the frequency of an n-gram pair T s -T v is not greater than a second 
threshold: 

clustering the string with a cluster associated with the n-gram pair 
Ts-T v ; 

otherwise: 

for every other n-gram T x in the string Ti... R> eX ce P t s and v: 

clustering the string with a cluster associated with an n-gram 
triple T s -Tv-T X; 

otherwise: 
do nothing, 

where Ti... R is a set of n-grams, R is the number of elements in Ti... R; and T s , T v , and 
T x are members of Ti ... R . 

2. (original) The method of claim 1 further including compiling n-gram statistics. 

3. (original) The method of claim 1 further including compiling n-gram pair statistics. 
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4. (currently amended) A method implemented in a computer system, for clustering a plurality 
of strings, each string including a plurality of characters, the method including: 

identifying unique n-grams in each string; and 

clustering each string with zero or more clusters associated with low frequency n - 
grams from that string; and 

concluding that (a) none of the unique n-grams are low frequency n-grams and that 
(b) one or more pairs of high frequency n-grams from the string are low 
frequency pairs and, in response, clustering each string with zero one or more 
clusters associated with low-frequency pairs of high frequency n-grams from that 
string. 

5. (currently amended) A The method of claim 4 further including implemented in a 
computer system, for clustering a plurality of strings, each string including a plurality of 
characters, the method including: 

identifying unique n-grams in each string; and 

concluding that (a) none of the unique n-grams are low frequency n-grams and that 
(b) no pairs of high frequency n-grams from the string are low frequency 
pairs and, in response, where a string does not include any low - frequency 
pairs of high frequency n - grams, associating that string with clusters associated 
with triples of n-grams including the pair. 
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1 6. (previously presented) A method implemented in a computer system, for clustering a string, 

2 the string including a plurality of characters, the method including: 



3 identifying R unique n-grams Ti R in the string; 

4 for every unique n-gram T s : 

5 if a frequency of T s in a set of n-gram statistics is not greater than a first threshold: 

6 clustering the string with a cluster associated with T s ; 

7 otherwise: 

8 for i = 1 to Y: 

9 for every unique set of i n-grams Tu in the string Ti... R> excep t s: 

10 if the frequency of the n-gram set T s -Tu is not greater than a second 

11 threshold: 

12 clustering the string with a cluster associated with the n-gram set 

13 Ts-Tu; 

14 if the string has not been associated with a cluster with this value of Ts: 

15 for every unique set of Y+l n-grams Tuy in the string Ti... R , eX ce P t s: 

16 clustering the string with a cluster associated with the Y+2 n-gram 

17 group Ts-Tuy, 

18 where Ti... R is a set of n-grams, R is the number of elements in Ti... R , T s and Tu are 

19 members of Ti.. R , Tuy is a subset of Ti.. R , and i and Y are integers. 
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7. (original) The method of claim 6 where Y = 1. 

8. (original) The method of claim 6 further including compiling n-gram statistics. 

9. (original) The method of claim 6 further including compiling n-gram group statistics. 

10. (currently amended) A computer program, stored on a tangible storage medium, for use in 
clustering a string, the program including executable instructions that cause a computer to: 

identify R unique n-grams Ti R in the string; 
for every unique n-gram T s : 

if a frequency of T s in a set of n-gram statistics is not greater than a first threshold: 
cluster the string with a cluster associated with T s ; 

otherwise: 

for every other n-gram T v in the string Ti... R> eX ce P t s: 
tf concluding that the frequency of n-gram Tv is greater than the first 
threshol d, and in response : 
if the frequency of an n-gram pair T s -T v is not greater than a second 
threshold: 

cluster the string with a cluster associated with the n-gram pair Ts- 
Tv; 

otherwise 

for every other n-gram T x in the string Ti... R> except s and v: 

cluster the string with a cluster associated with an n-gram 
triple Ts-Tv-Tx; 

do nothing, 

where Ti... R is a set of n-grams, R is the number of elements in Ti... R , and T s , T v , 
and T x are members of Ti... R . 
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11. (original) The computer program of claim 10 further including executable instructions that 
cause a computer to compile n-gram statistics. 

12. (original) The computer program of claim 10 further including executable instructions that 
cause a computer to compile n-gram pair statistics. 
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